Categorical Data

Analysing the different Modalities

Loading the data

lyn <- read_csv('lynott_connell_2009_modality.csv')

str(lyn)
1
reading in the data from your computer (remember to store your script and data in the same file)
2
checking out how our data looks like
spc_tbl_ [423 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ PropertyBritish    : chr [1:423] "abrasive" "absorbent" "aching" "acidic" ...
 $ Word               : chr [1:423] "abrasive" "absorbent" "aching" "acidic" ...
 $ DominantModality   : chr [1:423] "Haptic" "Visual" "Haptic" "Gustatory" ...
 $ Sight              : num [1:423] 2.89 4.14 2.05 2.19 1.12 ...
 $ Touch              : num [1:423] 3.684 3.143 3.667 1.143 0.625 ...
 $ Sound              : num [1:423] 1.684 0.714 0.667 0.476 0.375 ...
 $ Taste              : num [1:423] 0.5789 0.4762 0.0476 4.1905 3 ...
 $ Smell              : num [1:423] 0.5789 0.4762 0.0952 2.9048 3.5 ...
 $ ModalityExclusivity: num [1:423] 0.33 0.41 0.555 0.341 0.362 ...
 - attr(*, "spec")=
  .. cols(
  ..   PropertyBritish = col_character(),
  ..   Word = col_character(),
  ..   DominantModality = col_character(),
  ..   Sight = col_double(),
  ..   Touch = col_double(),
  ..   Sound = col_double(),
  ..   Taste = col_double(),
  ..   Smell = col_double(),
  ..   ModalityExclusivity = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Modifying bits and pieces

lyn <- rename(lyn, Mod = DominantModality)
lyn <- rename(lyn, Excl = ModalityExclusivity)

lyn <- lyn %>% mutate_at('Mod', as.factor)

str(lyn)
1
rename the column “DominantModality”
2
rename the column “ModalityExclusivity”
3
mutate the renamed column “Mod” to a factor, as this one is a categorical variable and should be interpreted by R as such
4
checking out how our data looks like
tibble [423 × 9] (S3: tbl_df/tbl/data.frame)
 $ PropertyBritish: chr [1:423] "abrasive" "absorbent" "aching" "acidic" ...
 $ Word           : chr [1:423] "abrasive" "absorbent" "aching" "acidic" ...
 $ Mod            : Factor w/ 5 levels "Auditory","Gustatory",..: 3 5 3 2 4 3 2 5 5 5 ...
 $ Sight          : num [1:423] 2.89 4.14 2.05 2.19 1.12 ...
 $ Touch          : num [1:423] 3.684 3.143 3.667 1.143 0.625 ...
 $ Sound          : num [1:423] 1.684 0.714 0.667 0.476 0.375 ...
 $ Taste          : num [1:423] 0.5789 0.4762 0.0476 4.1905 3 ...
 $ Smell          : num [1:423] 0.5789 0.4762 0.0952 2.9048 3.5 ...
 $ Excl           : num [1:423] 0.33 0.41 0.555 0.341 0.362 ...

What are the modalities?

levels(lyn$Mod)
1
checking the levels of the categorical variable “Modality”
[1] "Auditory"  "Gustatory" "Haptic"    "Olfactory" "Visual"   

Absolute frequencies

counting

lyn %>% count(Mod)
1
counting how many times each level is present
# A tibble: 5 × 2
  Mod           n
  <fct>     <int>
1 Auditory     68
2 Gustatory    54
3 Haptic       70
4 Olfactory    26
5 Visual      205

plotting

lyn %>% count(Mod) %>%
  ggplot(aes(x = Mod, y = n, fill = Mod)) +
  geom_bar(stat = 'identity')  +
  theme(text=element_text(size=50))
1
counting how many times each level is present
2
plot data, specify x and y, and colour-coding with fill.
3
tell R that you want to plot a bar plot, stat = ‘identity’ if you have the y-axis values
4
leave out, just for the html file :)

Relative frequencies

calculating proportions

lyn %>%
  group_by(Mod) %>%
  summarise(n = n()) %>%
  mutate(rel_freq = n / sum(n))
1
pass the tibble ‘lyn’ on to the upcoming code
2
specify that we group the words according to their modality
3
count the words of each modality
4
divide the number of words per modality by the total number of words
# A tibble: 5 × 3
  Mod           n rel_freq
  <fct>     <int>    <dbl>
1 Auditory     68   0.161 
2 Gustatory    54   0.128 
3 Haptic       70   0.165 
4 Olfactory    26   0.0615
5 Visual      205   0.485 

plotting proportions

lyn %>%
  group_by(Mod) %>%
  summarise(n = n()) %>%
  mutate(rel_freq = n / sum(n)) %>%
  ggplot(aes(x = Mod, y = rel_freq, fill = Mod)) +
  geom_bar(stat = 'identity') +
  theme(text=element_text(size=50))
1
pass the tibble ‘lyn’ on to the upcoming code
2
specify that we group the words according to their modality
3
count the words of each modality
4
divide the number of words per modality by the total number of words
5
specify x and y, and colour coding (modalities)
6
specify the plot type which is a bar plot
7
leave out, just for the html file :)

plotting percentages

lyn %>%
  group_by(Mod) %>%
  summarise(n = n()) %>%
  mutate(rel_freq = n / sum(n)) %>%
  ggplot(aes(x = Mod, y = rel_freq * 100, fill = Mod)) +
  geom_bar(stat = 'identity') +
  labs(y = "Percentages", x = "Modalities of sensory words") +
  theme(text=element_text(size=50))
1
pass the tibble ‘lyn’ on to the upcoming code
2
specify that we group the words according to their modality
3
count the words of each modality
4
divide the number of words per modality by the total number of words
5
specify x (times 100) and y, and colour coding (modalities)
6
specify the plot type which is a bar plot
7
rename the x and y axis label
8
leave out, just for the html file :)